Data Quality & Coverage Gate (Interim N)
This section prevents misleading models when cells are missing. Current data are interim (N=26); treat all effects as descriptive.
Interim Status: All inferential models in this document are fit on N=26 (interim sample), not N=48. Any statements about N=48 refer to the planned final sample, not the data used here.
Participant counts (interim). Note: 'df' refers to correct trials, RT ∈ [150, 6000] ms, non-practice.
| Total participants (raw) |
26 |
| Participants with any valid trials |
26 |
| Participants in df (correct, RT-filtered) |
26 |
Condition coverage (modality × ui_mode × pressure)
| hand |
static |
0 |
675 |
25 |
FALSE |
OK |
| hand |
static |
1 |
702 |
26 |
FALSE |
OK |
| hand |
adaptive |
0 |
675 |
25 |
FALSE |
OK |
| hand |
adaptive |
1 |
702 |
26 |
FALSE |
OK |
| gaze |
static |
0 |
645 |
24 |
FALSE |
OK |
| gaze |
static |
1 |
701 |
26 |
FALSE |
OK |
| gaze |
adaptive |
0 |
700 |
26 |
FALSE |
OK |
| gaze |
adaptive |
1 |
675 |
25 |
FALSE |
OK |
All factors have ≥2 levels in the interim data.
Blocks logged per participant
| P002 |
7 |
| P003 |
8 |
| P004 |
8 |
| P006 |
8 |
| P007 |
8 |
| P008 |
8 |
| P009 |
8 |
| P010 |
8 |
| P011 |
8 |
| P014 |
8 |
| P015 |
8 |
| P018 |
8 |
| P019 |
8 |
| P020 |
8 |
| P022 |
8 |
| P023 |
8 |
| P024 |
8 |
| P025 |
8 |
| P029 |
8 |
| P037 |
4 |
| P038 |
8 |
| P039 |
8 |
| P040 |
8 |
| P041 |
8 |
| P042 |
8 |
| P049 |
8 |
Current dataset: N=26 participants (interim sample). All inferential results in this document are preliminary and will be re-estimated at N≈48. Sections 16 (LBA) and 17 (Control Theory) document planned analyses that have not yet been implemented at the current interim N.
1. Executive Summary
This report analyzes 26 participants performing Fitts’ law pointing tasks across two input modalities (Hand, Gaze) and two UI modes (Static, Adaptive).
Note on Participant Exclusions: Seven participants (P002, P003, P007, P008, P015, P039, P040) were excluded from the planned 2×2×2 factorial dataset due to a data logging error that incorrectly recorded pressure conditions. The bug was fixed on December 8, 2025 (commit 04758db), and seven replacement participants (P049-P055) will be collected to reach the planned final sample of N=48. All analyses in this interim report use N=26 participants with complete data across all conditions.
All statistical models in this document are fit on N=26 (interim sample), not N=48. See the Data Quality Notes section and EXCLUSION_CRITERIA.md for details.
Results Snapshot (Interim, N = 26)
RQ1 contrasts (adaptive - static; interim descriptive)
| hand |
0.0343380 |
-0.0174301 |
0.0014524 |
| gaze |
-0.1029755 |
0.0494700 |
-0.0087236 |
*Note:* These contrasts are descriptive only; no inferential claims are made at N=`r n_distinct(df$pid)`.
RQ2 snapshot: Overall TLX (interim)
| gaze |
adaptive |
48.4 |
| gaze |
static |
46.9 |
| hand |
adaptive |
41.9 |
| hand |
static |
42.2 |
RQ3 manipulation check: width scaling (interim)
| gaze |
adaptive |
1 |
0 |
| gaze |
static |
1 |
0 |
| hand |
adaptive |
1 |
0 |
| hand |
static |
1 |
0 |
*Note:* In the current build, width scaling was not activated; all recorded `width_scale_factor` values equal 1.0. RQ3 will be revisited once adaptive width scaling is enabled.
Key Findings
- Total Trials Analyzed: 4734 valid trials (correct responses, RT 150-6000ms)
- Total Trials Collected: 5481
- Overall Error Rate: 14%
- Mean Throughput: 3.3 bits/s (SD = 1.05)
- Mean Movement Time: 1.177s (SD = 0.47s)
2. Demographics
Sample Size: N = 26 participants.
Overall Demographics
| 26 |
30.9 |
7.4 |
18 - 54 |
0.7 |
2 |
By Gender
| female |
9 |
32.1 |
8.8 |
0 |
| male |
17 |
30.3 |
6.7 |
1 |
Gaming Status
Participants were primarily non-gamers (median self-reported gaming = 0 hours/week; only 3.8 % reported ≥5 hrs/week).
3. Primary Analysis: Throughput
Research Question: Does the Adaptive UI improve performance (Throughput) compared to Static, especially for Gaze?
Sample Size: N = 26 participants with valid throughput data.
Interim Analysis Note: At this interim N, we observe a large main effect of modality (hand > gaze) on throughput, but no reliable evidence that the Adaptive UI improves TP relative to Static, nor clear interactions with pressure. Interaction effects are treated as exploratory and will be revisited at N=48.
Summary Statistics
Throughput (bits/s) by Condition (N = 26 participants)
| hand |
static |
0 |
25 |
75 |
3.50 |
0.90 |
3.44 |
2.89 |
3.99 |
| hand |
static |
1 |
26 |
78 |
3.53 |
0.96 |
3.51 |
2.90 |
4.10 |
| hand |
adaptive |
0 |
25 |
75 |
3.64 |
0.97 |
3.66 |
2.95 |
4.32 |
| hand |
adaptive |
1 |
26 |
78 |
3.46 |
0.93 |
3.42 |
2.69 |
4.11 |
| gaze |
static |
0 |
23 |
69 |
3.28 |
1.27 |
3.08 |
2.48 |
4.06 |
| gaze |
static |
1 |
25 |
74 |
2.95 |
0.88 |
2.95 |
2.36 |
3.38 |
| gaze |
adaptive |
0 |
26 |
77 |
3.05 |
1.13 |
2.82 |
2.32 |
3.76 |
| gaze |
adaptive |
1 |
25 |
75 |
2.96 |
1.06 |
2.69 |
2.26 |
3.37 |
Statistical Model Results
Planned Sample Size & Power
The throughput analysis was designed for a within-subjects 2×2×2 factorial (modality × UI mode × pressure). Our primary effect of interest is the UI mode main effect (adaptive vs static), which we expect to be medium in size (dz ≈ 0.4–0.6). Standard repeated-measures power calculations and guidelines (Cohen, 1988; Brysbaert, 2019) indicate that N ≈ 50 participants is sufficient for 80% power to detect dz ≈ 0.40. We therefore set N = 48 (six complete Williams sequences) as the primary design target, with the option to extend to N = 64 (eight sequences) if recruitment permits. Given the large number of trials per condition and the mixed-effects model (random intercepts per participant), this sample size is expected to provide high power for UI mode and modality main effects, while interactions are treated as secondary and more exploratory (Kumle et al., 2021; Matuschek et al., 2017).
### Model: TP ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 601 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).
**Data Summary:** 26 participants, 601 trials, 8 conditions, minimum 69 trials per condition.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 33.741 33.741 1 575.49 53.8459 7.422e-13 ***
ui_mode 0.205 0.205 1 575.50 0.3275 0.56738
pressure 2.735 2.735 1 575.82 4.3651 0.03712 *
modality:ui_mode 0.807 0.807 1 575.50 1.2880 0.25689
modality:pressure 0.884 0.884 1 575.78 1.4112 0.23534
ui_mode:pressure 0.001 0.001 1 575.84 0.0015 0.96892
modality:ui_mode:pressure 1.624 1.624 1 575.84 2.5916 0.10798
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Interim Analysis Note:** At N= 26 , the model is underpowered for detecting 3-way interactions. Any non-significant interaction effects should be treated as exploratory and will be revisited at N=48.
#### Model Summary
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: formula_tp$formula
Data: df_iso
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
1515.4 1559.3 -747.7 1495.4 591
Scaled residuals:
Min 1Q Median 3Q Max
-2.6783 -0.7692 0.0250 0.6116 4.3216
Random effects:
Groups Name Variance Std.Dev.
pid (Intercept) 0.3871 0.6222
Residual 0.6266 0.7916
Number of obs: 601, groups: pid, 26
Fixed effects:
Estimate Std. Error df t value
(Intercept) 3.48043 0.15269 55.17874 22.793
modalitygaze -0.22036 0.13244 575.66759 -1.664
ui_modeadaptive 0.14328 0.12927 575.11157 1.108
pressure1 0.04843 0.12829 575.73630 0.377
modalitygaze:ui_modeadaptive -0.35561 0.18469 575.71842 -1.925
modalitygaze:pressure1 -0.36256 0.18438 575.25656 -1.966
ui_modeadaptive:pressure1 -0.21369 0.18105 575.11157 -1.180
modalitygaze:ui_modeadaptive:pressure1 0.41727 0.25920 575.84461 1.610
Pr(>|t|)
(Intercept) <2e-16 ***
modalitygaze 0.0967 .
ui_modeadaptive 0.2682
pressure1 0.7060
modalitygaze:ui_modeadaptive 0.0547 .
modalitygaze:pressure1 0.0497 *
ui_modeadaptive:pressure1 0.2384
modalitygaze:ui_modeadaptive:pressure1 0.1080
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) mdltyg u_mddp prssr1 mdlt:_ mdlt:1 u_md:1
modalitygaz -0.413
ui_modedptv -0.423 0.488
pressure1 -0.430 0.492 0.504
mdltygz:_md 0.294 -0.717 -0.700 -0.350
mdltygz:pr1 0.297 -0.716 -0.351 -0.693 0.514
u_mddptv:p1 0.302 -0.348 -0.714 -0.706 0.500 0.491
mdltygz:_:1 -0.207 0.510 0.499 0.488 -0.713 -0.712 -0.698
#### Effect Size: Hand vs. Gaze (Collapsed Over UI Mode and Pressure)
Table: Estimated Marginal Means for Throughput by Modality (collapsed over UI mode and pressure)
|Modality | Mean TP (bits/s)| 95% CI Lower| 95% CI Upper|
|:--------|----------------:|------------:|------------:|
|Hand | 3.52| 3.25| 3.79|
|Gaze | 3.05| 2.78| 3.32|
**Difference (Hand - Gaze):** 0.48 bits/s
#### Pairwise Comparisons (Holm-adjusted)
contrast estimate SE df
hand static pressure0 - gaze static pressure0 0.2203574 0.1332586 582.65
hand static pressure0 - hand adaptive pressure0 -0.1432760 0.1300617 582.09
hand static pressure0 - gaze adaptive pressure0 0.4326909 0.1295218 582.76
hand static pressure0 - hand static pressure1 -0.0484280 0.1290894 582.73
hand static pressure0 - gaze static pressure1 0.5344853 0.1310262 583.10
hand static pressure0 - hand adaptive pressure1 0.0219822 0.1290894 582.73
hand static pressure0 - gaze adaptive pressure1 0.5432359 0.1300617 582.09
gaze static pressure0 - hand adaptive pressure0 -0.3636334 0.1332586 582.65
gaze static pressure0 - gaze adaptive pressure0 0.2123335 0.1327391 583.29
gaze static pressure0 - hand static pressure1 -0.2687854 0.1323064 583.25
gaze static pressure0 - gaze static pressure1 0.3141279 0.1338097 583.03
gaze static pressure0 - hand adaptive pressure1 -0.1983752 0.1323064 583.25
gaze static pressure0 - gaze adaptive pressure1 0.3228785 0.1332586 582.65
hand adaptive pressure0 - gaze adaptive pressure0 0.5759669 0.1295218 582.76
hand adaptive pressure0 - hand static pressure1 0.0948481 0.1290894 582.73
hand adaptive pressure0 - gaze static pressure1 0.6777613 0.1310262 583.10
hand adaptive pressure0 - hand adaptive pressure1 0.1652582 0.1290894 582.73
hand adaptive pressure0 - gaze adaptive pressure1 0.6865119 0.1300617 582.09
gaze adaptive pressure0 - hand static pressure1 -0.4811188 0.1279659 582.11
gaze adaptive pressure0 - gaze static pressure1 0.1017944 0.1298916 582.44
t.ratio p.value
1.654 1.0000
-1.102 1.0000
3.341 0.0160
-0.375 1.0000
4.079 0.0011
0.170 1.0000
4.177 0.0008
-2.729 0.1048
1.600 1.0000
-2.032 0.5545
2.348 0.2692
-1.499 1.0000
2.423 0.2355
4.447 0.0003
0.735 1.0000
5.173 <.0001
1.280 1.0000
5.278 <.0001
-3.760 0.0036
0.784 1.0000
Degrees-of-freedom method: kenward-roger
P value adjustment: holm method for 28 tests
4. Movement Time Analysis
Research Question: How does movement time vary across conditions?
Sample Size: N = 26 participants with valid movement time data (correct trials only).
Relationship to Throughput: The RT patterns mirror throughput: hand is faster than gaze. Adaptive vs static and pressure do not show robust main effects on movement time at this N, consistent with the TP results.
Summary Statistics
Movement Time (s) by Condition (N = 26 participants)
| hand |
static |
0 |
25 |
638 |
1.115 |
0.382 |
1.055 |
| hand |
static |
1 |
26 |
672 |
1.104 |
0.318 |
1.069 |
| hand |
adaptive |
0 |
25 |
644 |
1.067 |
0.330 |
1.018 |
| hand |
adaptive |
1 |
26 |
664 |
1.111 |
0.303 |
1.064 |
| gaze |
static |
0 |
24 |
492 |
1.198 |
0.483 |
1.107 |
| gaze |
static |
1 |
25 |
538 |
1.259 |
0.535 |
1.120 |
| gaze |
adaptive |
0 |
26 |
557 |
1.328 |
0.706 |
1.132 |
| gaze |
adaptive |
1 |
25 |
529 |
1.298 |
0.567 |
1.153 |
Statistical Model Results
Planned Sample Size & Power
The log-RT analysis uses the same 2×2×2 within-subjects design and random-intercept LMM as the throughput analysis. Because throughput and RT are mathematically coupled (TP = ID/RT) and we expect similar medium-sized UI mode and modality effects, the sample-size logic is identical: N = 48 is sufficient for detecting dz ≈ 0.40–0.50 differences with ≈0.80 power, and N = 64 further strengthens power for smaller effects and interactions (Cohen, 1988). Trial-level modeling with many repeated observations per participant increases precision, but our power planning is intentionally conservative and based on participant-level effects rather than naïvely counting trials.
### Model: log_rt ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 4734 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).
**Data Summary:** 26 participants, 4734 trials, 8 conditions, minimum 492 trials per condition.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 11.2853 11.2853 1 4709.0 145.7241 < 2.2e-16 ***
ui_mode 0.2755 0.2755 1 4708.5 3.5579 0.0593237 .
pressure 0.6969 0.6969 1 4709.1 8.9993 0.0027150 **
modality:ui_mode 1.0035 1.0035 1 4708.6 12.9582 0.0003218 ***
modality:pressure 0.0398 0.0398 1 4709.0 0.5140 0.4734569
ui_mode:pressure 0.0091 0.0091 1 4709.3 0.1173 0.7319753
modality:ui_mode:pressure 0.8369 0.8369 1 4709.2 10.8063 0.0010190 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Pairwise Comparisons (Holm-adjusted)
contrast estimate SE df
hand static pressure0 - gaze static pressure0 -0.03657564 0.01675577 Inf
hand static pressure0 - hand adaptive pressure0 0.03803403 0.01554896 Inf
hand static pressure0 - gaze adaptive pressure0 -0.11097521 0.01619107 Inf
hand static pressure0 - hand static pressure1 0.00540702 0.01542684 Inf
hand static pressure0 - gaze static pressure1 -0.09656918 0.01637813 Inf
hand static pressure0 - hand adaptive pressure1 -0.00465645 0.01546792 Inf
hand static pressure0 - gaze adaptive pressure1 -0.11167748 0.01637919 Inf
gaze static pressure0 - hand adaptive pressure0 0.07460967 0.01671276 Inf
gaze static pressure0 - gaze adaptive pressure0 -0.07439957 0.01730357 Inf
gaze static pressure0 - hand static pressure1 0.04198266 0.01660071 Inf
gaze static pressure0 - gaze static pressure1 -0.05999354 0.01743526 Inf
gaze static pressure0 - hand adaptive pressure1 0.03191918 0.01664235 Inf
gaze static pressure0 - gaze adaptive pressure1 -0.07510184 0.01748569 Inf
hand adaptive pressure0 - gaze adaptive pressure0 -0.14900924 0.01615716 Inf
hand adaptive pressure0 - hand static pressure1 -0.03262701 0.01539023 Inf
hand adaptive pressure0 - gaze static pressure1 -0.13460321 0.01633408 Inf
hand adaptive pressure0 - hand adaptive pressure1 -0.04269049 0.01543168 Inf
hand adaptive pressure0 - gaze adaptive pressure1 -0.14971151 0.01634394 Inf
gaze adaptive pressure0 - hand static pressure1 0.11638223 0.01595706 Inf
gaze adaptive pressure0 - gaze static pressure1 0.01440603 0.01684592 Inf
z.ratio p.value
-2.183 0.2614
2.446 0.1444
-6.854 <.0001
0.350 1.0000
-5.896 <.0001
-0.301 1.0000
-6.818 <.0001
4.464 0.0001
-4.300 0.0003
2.529 0.1258
-3.441 0.0075
1.918 0.3858
-4.295 0.0003
-9.222 <.0001
-2.120 0.2721
-8.241 <.0001
-2.766 0.0680
-9.160 <.0001
7.293 <.0001
0.855 1.0000
Degrees-of-freedom method: asymptotic
P value adjustment: holm method for 28 tests
5. Fitts’ Law Modelling
Research Question: How well does the data fit Fitts’ Law? (Linearity check).
Planned Sample Size & Power
Fitts’ law analyses serve primarily to validate the pointing task and modality differences, not to test the core adaptation hypotheses. The ID effect on movement time is typically very large (R² > .70), and robust Fitts-law slopes are observable with as few as 10–20 participants in classic HCI work. In this study, any final sample N ≥ 30 is more than sufficient for stable ID slopes; our planned N = 48 places this analysis in an over-powered, descriptive regime. We therefore do not perform formal power calculations here and treat Fitts regression as a manipulation check and descriptive characterization of the dataset.
Sample Size: N = 26 participants with valid throughput data.
Flatter slopes indicate less sensitivity to difficulty (ballistic movement).
Linear Regression: MT ~ IDe (N = 26 participants)
| hand |
static |
0.536 |
0.147 |
0.533 |
| hand |
adaptive |
0.538 |
0.133 |
0.570 |
| gaze |
static |
0.253 |
0.179 |
0.595 |
| gaze |
adaptive |
0.228 |
0.214 |
0.538 |
6. Error Rate Analysis
Research Question: How do error rates differ across conditions?
Sample Size: N = 26 participants with all trials (correct + incorrect).
Error Rates by Condition (N = 26 participants)
|
modality
|
ui_mode
|
pressure
|
Participants
|
Mean_Error_Rate
|
SD_Error_Rate
|
|
hand
|
static
|
0
|
25
|
5.48
|
7.02
|
|
hand
|
static
|
1
|
26
|
4.27
|
5.99
|
|
hand
|
adaptive
|
0
|
25
|
4.59
|
7.27
|
|
hand
|
adaptive
|
1
|
26
|
5.41
|
6.64
|
|
gaze
|
static
|
0
|
24
|
23.69
|
18.44
|
|
gaze
|
static
|
1
|
25
|
20.16
|
12.51
|
|
gaze
|
adaptive
|
0
|
26
|
20.43
|
12.49
|
|
gaze
|
adaptive
|
1
|
25
|
21.63
|
14.21
|
Error Rate by Modality and UI Mode (participant-level means). N = 26 participants. Raincloud plot: mirrored half-violins (Static←left, Adaptive→right) with boxplots inside, individual points in columns, connecting lines show paired comparisons.
**Error Rate Summary:** Overall error rate was 13.5 %. Errors were concentrated in gaze conditions ( 22.2 %), while hand remained near 4.9 %. At this interim N and low error count, we report these differences descriptively.
Statistical Model Results
Planned Sample Size & Power
For the error-rate analysis we fit a binomial GLMM with random intercepts per participant. The key contrasts are again UI mode and modality, where we expect odds-ratio effects in the small-to-medium range (e.g., OR ≈ 0.7–0.8 for adaptive vs static, and OR ≈ 2–3 for gaze vs hand). Binary outcomes with relatively low error rates (≈10–15%) typically require more participants than continuous outcomes for stable mixed-effects estimation (Kumle et al., 2021). For this analysis, we therefore treat N = 64 as a “good” target that yields comfortable power for medium effects, while N = 48 remains adequate but somewhat less stable, especially for interaction terms and rare error types. Error-based interaction effects are interpreted as exploratory, even at N = 64.
### Model: error ~ modality * ui_mode * pressure + (1 | pid)
**Model Structure:** Random intercept only (participant-level random effects). The trial-level N (≈ 5475 ) should not be mistaken for independent units in discussions of power; the effective N for inference is the number of participants ( 26 ).
**Data Summary:** 26 participants, 5475 trials, 8 conditions, minimum 645 trials per condition.
**Overall Error Rate:** 13.5 %
#### ANOVA Table
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: error
Chisq Df Pr(>Chisq)
(Intercept) 187.4919 1 <2e-16 ***
modality 82.2650 1 <2e-16 ***
ui_mode 0.5819 1 0.4456
pressure 0.9846 1 0.3211
modality:ui_mode 0.0000 1 0.9965
modality:pressure 0.6884 1 0.4067
ui_mode:pressure 1.5809 1 0.2086
modality:ui_mode:pressure 0.8457 1 0.3578
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Pairwise Comparisons (Holm-adjusted)
contrast odds.ratio SE df
hand static pressure0 / gaze static pressure0 0.166696 0.0329272 Inf
hand static pressure0 / hand adaptive pressure0 1.212575 0.3063988 Inf
hand static pressure0 / gaze adaptive pressure0 0.202389 0.0400403 Inf
hand static pressure0 / hand static pressure1 1.287513 0.3279074 Inf
hand static pressure0 / gaze static pressure1 0.169015 0.0331117 Inf
hand static pressure0 / hand adaptive pressure1 0.996022 0.2398462 Inf
hand static pressure0 / gaze adaptive pressure1 0.190235 0.0375784 Inf
gaze static pressure0 / hand adaptive pressure0 7.274181 1.5306456 Inf
gaze static pressure0 / gaze adaptive pressure0 1.214121 0.1675520 Inf
gaze static pressure0 / hand static pressure1 7.723729 1.6435371 Inf
gaze static pressure0 / gaze static pressure1 1.013914 0.1369220 Inf
gaze static pressure0 / hand adaptive pressure1 5.975089 1.1691062 Inf
gaze static pressure0 / gaze adaptive pressure1 1.141213 0.1570270 Inf
hand adaptive pressure0 / gaze adaptive pressure0 0.166908 0.0351470 Inf
hand adaptive pressure0 / hand static pressure1 1.061800 0.2811833 Inf
hand adaptive pressure0 / gaze static pressure1 0.139385 0.0291040 Inf
hand adaptive pressure0 / hand adaptive pressure1 0.821410 0.2064080 Inf
hand adaptive pressure0 / gaze adaptive pressure1 0.156885 0.0330059 Inf
gaze adaptive pressure0 / hand static pressure1 6.361580 1.3526089 Inf
gaze adaptive pressure0 / gaze static pressure1 0.835101 0.1125901 Inf
null z.ratio p.value
1 -9.070 <.0001
1 0.763 1.0000
1 -8.075 <.0001
1 0.992 1.0000
1 -9.074 <.0001
1 -0.017 1.0000
1 -8.401 <.0001
1 9.430 <.0001
1 1.406 1.0000
1 9.607 <.0001
1 0.102 1.0000
1 9.136 <.0001
1 0.960 1.0000
1 -8.502 <.0001
1 0.226 1.0000
1 -9.437 <.0001
1 -0.783 1.0000
1 -8.804 <.0001
1 8.702 <.0001
1 -1.337 1.0000
P value adjustment: holm method for 28 tests
Tests are performed on the log odds ratio scale
7. Accuracy & Gaze Dynamics
Sample Size: N = 26 participants with valid accuracy data.
Effective Width (\(W_e\))
Planned Sample Size & Power
Effective width (We) is analyzed at the participant × condition level with a Gaussian LMM. We expect medium effects of modality (gaze > hand) and small-to-medium effects of UI mode (adaptive slightly improving spatial precision). For within-subject effects of this magnitude, N ≈ 48 is sufficient for ≈0.80 power (dz ≈ 0.4–0.5) according to standard repeated-measures power guidelines (Cohen, 1988). We therefore treat N = 48 as a good target for We, with N = 64 mainly helping if UI-mode effects turn out closer to dz ≈ 0.3.
Lower \(W_e\) indicates tighter shot grouping (higher precision).
Effective Width (px) by Condition (N = 26 participants)
| hand |
static |
0 |
25 |
32.67 |
20.16 |
| hand |
static |
1 |
26 |
32.80 |
21.69 |
| hand |
adaptive |
0 |
25 |
33.06 |
21.49 |
| hand |
adaptive |
1 |
26 |
33.22 |
20.66 |
| gaze |
static |
0 |
23 |
35.72 |
20.59 |
| gaze |
static |
1 |
25 |
36.31 |
20.45 |
| gaze |
adaptive |
0 |
26 |
35.39 |
20.02 |
| gaze |
adaptive |
1 |
25 |
36.80 |
19.32 |
Effective target width was broadly similar between Static and Adaptive within each modality; gaze showed slightly larger We overall, consistent with higher variability in endpoint location.
Endpoint Accuracy Scatter Plot
Visualization of endpoint errors relative to target center. Each point represents one trial’s endpoint position.
Endpoint Error Distance (px) for Gaze Modality
| static |
0 |
492 |
11.97 |
8.21 |
9.58 |
| static |
1 |
538 |
11.95 |
8.17 |
9.59 |
| adaptive |
0 |
557 |
11.72 |
7.96 |
9.61 |
| adaptive |
1 |
529 |
12.62 |
8.21 |
10.52 |
The “Midas Touch” Struggle
Planned Sample Size & Power
Target re-entries are count-like and somewhat noisy, but we again analyze participant-level averages with an LMM (or, if needed, a Poisson GLMM). We anticipate medium modality effects (more re-entries for gaze) and small-to-medium UI-mode effects (fewer re-entries under adaptation). Given the noisier nature of this metric, a slightly larger sample is desirable if you want to treat it as confirmatory. We therefore treat N = 48 as adequate but exploratory and N = 64 as a “good” sample size for detecting medium within-subject effects in re-entry counts. Power reasoning follows the same logic as other continuous repeated-measures outcomes, tempered by mixed-model guidance from Kumle et al. (2021).
Target Re-entries measure how often the cursor drifted out of the target before selection.
Re-entries are interpreted here as a proxy for control stability; higher counts suggest more corrective movements. We will revisit this metric in the control-theory analyses (Section 10).
Target Re-entries by Condition (N = 26 participants)
| hand |
static |
0 |
25 |
0.26 |
0.53 |
| hand |
static |
1 |
26 |
0.23 |
0.47 |
| hand |
adaptive |
0 |
25 |
0.23 |
0.46 |
| hand |
adaptive |
1 |
26 |
0.22 |
0.47 |
| gaze |
static |
0 |
23 |
2.15 |
1.21 |
| gaze |
static |
1 |
25 |
2.15 |
1.01 |
| gaze |
adaptive |
0 |
26 |
2.19 |
1.19 |
| gaze |
adaptive |
1 |
25 |
2.23 |
1.43 |
8. Workload (NASA-TLX)
Subjective workload scores (lower is better).
Sample Size: N = 26 participants with TLX data.
Statistical Model: Overall TLX
Planned Sample Size & Power
NASA-TLX scores (overall and subscales) are collected at the block level and analyzed with an LMM (random intercepts per participant; fixed effects for modality and UI mode). TLX scores tend to be reasonably reliable, and we expect medium effects for both modality (gaze > hand) and UI mode (adaptive < static), especially on Physical Demand and Frustration. For within-subject designs with medium effects, ≈40–50 participants typically provide ≥0.80 power (Brysbaert, 2019). We therefore treat N = 48 as a good, pre-planned N for TLX analyses. An increase to N = 64 would mostly refine confidence intervals and interaction estimates rather than change the main power conclusions.
### Model: overall_tlx ~ modality * ui_mode + (1 | pid)
**Data Summary:** 26 participants, 243 observations.
#### ANOVA Table
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
modality 1458.47 1458.47 1 183.61 23.6453 2.48e-06 ***
ui_mode 2.38 2.38 1 183.60 0.0386 0.8444
modality:ui_mode 115.18 115.18 1 183.63 1.8673 0.1735
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### Estimated Marginal Means (Overall TLX by Modality × UI Mode)
Table: Estimated Marginal Means for Overall TLX by Condition (95% CI)
|Modality |UI Mode | Mean TLX| 95% CI Lower| 95% CI Upper|
|:--------|:--------|--------:|------------:|------------:|
|Hand |Static | 42.7| 36.6| 48.8|
|Gaze |Static | 46.6| 40.5| 52.7|
|Hand |Adaptive | 41.4| 35.1| 47.7|
|Gaze |Adaptive | 48.3| 42.1| 54.5|
9. Learning Curves & Practice Effects
Research Question: How does performance change within each condition? Do learning rates differ by condition?
Sample Size: N = 26 participants with trial-level data.
Note: These learning curves serve as a quality check that participants improved modestly and reached a plateau; we do not treat these as primary inferential outcomes. This analysis is exploratory/QC only.
This section shows learning curves aligned by condition start (accounting for Williams counterbalancing). For block-level trends, see Section 12.
Learning Curve Data Summary by Condition (N = 26 participants)
| Hand |
Static |
OFF |
27 |
1.114 |
0.0548 |
| Hand |
Static |
ON |
27 |
1.104 |
0.0427 |
| Hand |
Adaptive |
OFF |
27 |
1.067 |
0.0459 |
| Hand |
Adaptive |
ON |
27 |
1.111 |
0.0541 |
| Gaze |
Static |
OFF |
27 |
1.198 |
0.2373 |
| Gaze |
Static |
ON |
27 |
1.261 |
0.2325 |
| Gaze |
Adaptive |
OFF |
27 |
1.328 |
0.2045 |
| Gaze |
Adaptive |
ON |
27 |
1.302 |
0.2163 |
Error Rate Summary by Condition
| Hand |
Static |
OFF |
27 |
5.48% |
0.00% |
12.00% |
| Hand |
Static |
ON |
27 |
4.27% |
0.00% |
11.54% |
| Hand |
Adaptive |
OFF |
27 |
4.59% |
0.00% |
16.00% |
| Hand |
Adaptive |
ON |
27 |
5.41% |
0.00% |
23.08% |
| Gaze |
Static |
OFF |
27 |
23.73% |
8.33% |
37.50% |
| Gaze |
Static |
ON |
27 |
23.25% |
7.69% |
42.31% |
| Gaze |
Adaptive |
OFF |
27 |
20.45% |
3.85% |
34.62% |
| Gaze |
Adaptive |
ON |
27 |
21.63% |
8.00% |
40.00% |
Note: Data aligned by position within condition to account for Williams counterbalancing. For block-level trends, see Section 12: Block Order & Temporal Effects.
10. Movement Quality Metrics
Submovement Analysis
Research Question: Does adaptive UI reduce movement corrections? How do submovements relate to performance?
Submovements indicate intermittent control - fewer submovements suggest smoother, more ballistic movements.
Planned Sample Size & Power
Submovement count is a noisier movement-quality metric and is currently based on pre-computed peaks. We anticipate small-to-medium effects of UI mode (adaptive reducing corrective movements) and medium effects of modality, but with considerable between-participant variability. For such count-based metrics, simulation-based power analysis is strongly recommended (e.g., using the approach in Kumle et al., 2021). As a rule of thumb, N = 64–72 would be needed to treat submovement differences as confirmatory (especially for UI-mode effects), whereas N = 48 is more appropriate for exploratory visualization and effect-size estimation rather than strict NHST.
Data Availability Note: Submovement metrics are available for a subset of the interim sample (see counts below). All results in this section are descriptive and should be treated as preliminary engineering diagnostics, not inferential findings. We distinguish between: - Participants with submovement_count (legacy, pre-computed): N = N = 3 - Participants with submovement_count_recomputed (from trajectory data): N = N = 16 - Participants with full trajectory JSON data: N = N = 16
Submovement Count by Condition (N = 16 participants, using submovement_count_recomputed)
| hand |
static |
0 |
15 |
392 |
0.00 |
0.00 |
0 |
| hand |
static |
1 |
16 |
414 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
0 |
15 |
388 |
0.00 |
0.00 |
0 |
| hand |
adaptive |
1 |
16 |
410 |
0.00 |
0.00 |
0 |
| gaze |
static |
0 |
15 |
303 |
8.68 |
4.96 |
7 |
| gaze |
static |
1 |
15 |
324 |
9.14 |
5.14 |
8 |
| gaze |
adaptive |
0 |
16 |
347 |
9.71 |
6.18 |
8 |
| gaze |
adaptive |
1 |
15 |
320 |
9.09 |
5.75 |
7 |
ℹ **Note:** Hand modality shows zero submovements, indicating very smooth movements
with no detected velocity peaks (submovements). This is valid data.
Verification Time Analysis
Research Question: How much time is spent “stopping” vs. “moving”? Does adaptive UI reduce verification time?
Sample Size: N = 26 participants with verification time data.
Planned Sample Size & Power
Verification time (from first target entry to final selection) is conceptually closer to a decision-phase measure and serves as a bridge to future LBA modeling. We again expect medium modality effects and small-to-medium UI-mode effects, and we analyze it via an LMM. Because this outcome is continuous and based on many trials per participant, N = 48 is a good target for medium effects, and N = 64 provides added stability for smaller UI-mode differences or more complex interaction patterns. The same repeated-measures power guidelines apply as for RT and TP (Cohen, 1988).
Verification time represents the “precise stopping” phase, separate from the ballistic movement phase.
11. Error Patterns & Types
Research Question: What types of errors occur? Do error patterns differ by condition?
Sample Size: N = 26 participants with error type data.
**Error Type Summary:** Gaze produced more misses ( 0 %) than hand ( 99.3 %), consistent with its lower throughput. Adaptive UI did not yet show a clear reduction in any specific error type at N= 26 .
12. Block Order & Temporal Effects
Research Question: Are there order effects? Does performance improve or degrade over blocks?
Sample Size: N = 26 participants with block-level data.
Note: This section is exploratory/QC only. These analyses serve as quality checks for temporal trends and are not treated as primary inferential outcomes.
Block-Level Data Summary by Condition
|
Modality
|
UI Mode
|
Pressure
|
N Blocks
|
Mean Error Rate
|
|
Hand
|
Static
|
OFF
|
8
|
4.75%
|
|
Hand
|
Static
|
ON
|
8
|
3.35%
|
|
Hand
|
Adaptive
|
OFF
|
8
|
3.64%
|
|
Hand
|
Adaptive
|
ON
|
8
|
4.95%
|
|
Gaze
|
Static
|
OFF
|
8
|
22.96%
|
|
Gaze
|
Static
|
ON
|
8
|
21.06%
|
|
Gaze
|
Adaptive
|
OFF
|
8
|
19.74%
|
|
Gaze
|
Adaptive
|
ON
|
8
|
21.64%
|
Performance Across Blocks: Movement Time. Movement time by block number. Lower is better. Shaded regions show ±1 SE.
13. Spatial Patterns & Heatmaps
Research Question: Are there spatial biases in performance? Do some screen regions show better/worse performance?
Sample Size: N = 26 participants with spatial position data.
Note: These spatial visualizations are exploratory and serve as descriptive quality checks. They are not treated as primary inferential outcomes. At N=26, interpretation is limited, but these plots may be useful for understanding XR-specific spatial patterns (e.g., top vs bottom of visual field).
Error Density Heatmap
Where do endpoint errors occur? Are there systematic spatial biases?
14. Adaptive UI Mechanism Analysis
Width Scaling (Target Size Adaptation)
Research Question: Does the adaptive UI dynamically change target sizes? How does width scaling relate to performance?
Sample Size: N = 19 participants with width scaling data.
Status: In the current dataset, the width scaling mechanism was disabled/misconfigured; all recorded width_scale_factor values equal 1.0. Results here serve as a template for future analysis once scaling is active.
The adaptive UI may scale target widths based on performance. This section examines whether and how target sizes are adjusted.
**Note:** No target width scaling was observed in this dataset.
All `width_scale_factor` values are 1.0 (no scaling applied).
This indicates that the adaptive policy did not trigger during data collection.
Possible reasons:
- Hysteresis gate threshold not met (requires N consecutive slow/error trials)
- Performance thresholds (RT p75, error burst) not exceeded
- Adaptive policy not properly configured or enabled
- Participants performed well enough that adaptation was not needed
Target Width Scaling by Condition (N = 19 participants, No Scaling Observed)
| hand |
static |
0 |
18 |
486 |
1 |
0 |
0 |
0 |
0 |
| hand |
static |
1 |
19 |
513 |
1 |
0 |
0 |
0 |
0 |
| hand |
adaptive |
0 |
18 |
486 |
1 |
0 |
0 |
0 |
0 |
| hand |
adaptive |
1 |
19 |
513 |
1 |
0 |
0 |
0 |
0 |
| gaze |
static |
0 |
18 |
486 |
1 |
0 |
0 |
0 |
0 |
| gaze |
static |
1 |
19 |
513 |
1 |
0 |
0 |
0 |
0 |
| gaze |
adaptive |
0 |
19 |
513 |
1 |
0 |
0 |
0 |
0 |
| gaze |
adaptive |
1 |
18 |
486 |
1 |
0 |
0 |
0 |
0 |
**Note:** All width scale factors are 1.0 (no scaling applied).
This indicates the adaptive policy did not trigger during data collection.
**Note:** All width scale factors remain at 1.0 throughout the experiment.
The adaptive policy did not trigger any target size adjustments.
**Note:** All width scale factors are 1.0, so no relationship with performance can be assessed.
The plot shows all points at x=1.0, indicating no scaling occurred.
Alignment Gate Metrics
Research Question: If alignment gates are used, how do they affect performance? How often are false triggers detected?
Alignment gates may be used to ensure proper cursor alignment before selection. This section examines their usage and effectiveness.
**Alignment Gate Interpretation:** False triggers were rare (mean = 0.04 per trial). Adaptive UI did not show a meaningful change in false trigger rate compared to Static at this interim N.
ℹ **Note:** No recovery time data for gaze modality.
This indicates the alignment gate always passed (no false triggers) for these trials.
ℹ **Note:** No mean recovery time data for gaze modality.
This indicates the alignment gate always passed (no false triggers) for these trials.
Task Type Analysis
Research Question: Are there different task types (point vs. drag)? How does performance differ across task types?
If the experiment includes different task types, this section examines performance differences.
Performance by Task Type
| drag |
hand |
static |
666 |
1079.0 |
334.7 |
4.65 |
| drag |
hand |
adaptive |
666 |
1053.6 |
323.9 |
5.86 |
| drag |
gaze |
static |
618 |
1222.2 |
549.2 |
17.96 |
| drag |
gaze |
adaptive |
651 |
1268.7 |
583.9 |
20.28 |
| point |
hand |
static |
333 |
1071.9 |
306.6 |
4.80 |
| point |
hand |
adaptive |
333 |
1073.9 |
313.0 |
4.20 |
| point |
gaze |
static |
309 |
1223.2 |
536.9 |
22.98 |
| point |
gaze |
adaptive |
321 |
1288.5 |
554.2 |
17.13 |
Planned Sample Size & Power
Path-length efficiency (actual path / straight-line amplitude) is analyzed at the trial level but interpreted as a within-subject continuous outcome, with expected medium modality differences (longer, less efficient paths for gaze) and small-to-medium UI-mode effects. We treat N = 48 as a reasonable “good N” for detecting medium effects (dz ≈ 0.4–0.5), and N = 64 as an ideal target if path efficiency becomes more central to the argument. At both Ns, this analysis is secondary to the core throughput and RT results.
Path Length and Efficiency Metrics by Condition
| hand |
static |
0 |
348 |
753.0 |
370.1 |
2.40 |
0.516 |
382.9 |
1076.5 |
| hand |
static |
1 |
367 |
753.5 |
372.2 |
2.39 |
0.513 |
381.3 |
1082.5 |
| hand |
adaptive |
0 |
347 |
755.9 |
372.3 |
2.43 |
0.519 |
383.5 |
1037.6 |
| hand |
adaptive |
1 |
367 |
760.0 |
374.4 |
2.40 |
0.513 |
385.6 |
1094.3 |
| gaze |
static |
0 |
267 |
688.1 |
358.1 |
2.11 |
0.543 |
330.1 |
1239.6 |
| gaze |
static |
1 |
294 |
702.2 |
364.6 |
2.12 |
0.541 |
337.6 |
1272.5 |
| gaze |
adaptive |
0 |
313 |
701.9 |
367.4 |
2.08 |
0.551 |
334.4 |
1349.7 |
| gaze |
adaptive |
1 |
291 |
767.3 |
362.2 |
2.28 |
0.508 |
405.1 |
1308.8 |
⚠ Cannot create ID bins: insufficient variation or invalid break points.
Skipping ID binning plot.
15. Gaze-Specific Analysis: Hover/Dwell Time
Research Question: How does hover/dwell time vary across gaze conditions? Does adaptive UI affect dwell time before confirmation?
Planned Sample Size & Power
Hover/dwell time is modeled only for gaze trials with fixed effects for UI mode and pressure. Because this shrinks the effective dataset and the expected UI-mode effects may be small-to-medium (dz ≈ 0.3–0.5), we treat this analysis as exploratory unless N ≥ 64. At N = 48, the study is adequately powered for medium effects but underpowered for smaller ones; at N = 64, we expect ≈0.80 power even if the UI-mode effect is closer to dz ≈ 0.35, based on standard repeated-measures calculations and mixed-model heuristics (Cohen, 1988; Kumle et al., 2021).
Sample Size: N = 0 (no data) participants with gaze hover/dwell data.
Note: This analysis is exploratory. At this interim N, CIs are wide and results should be treated as preliminary. This analysis will be revisited at N=48.
Hover/dwell time represents the duration the cursor remains in the target before confirmation in gaze trials. This metric is specific to gaze modality and reflects the “Midas touch” problem—the need for deliberate confirmation to avoid unintended selections.
⚠ No valid hover/dwell time data available for gaze trials.
- Hierarchical LBA (verification-time RTs); requires more data and packages (RWiener/rtdists).
- Control-theory kinematics (velocity profiles, submovement decomposition) once trajectories vetted.
- Identification checks: need ≥2 levels per factor and adequate trial counts per cell (~24+).
- Error-type breakdowns and spatial heatmaps to remain exploratory/QC until full N.
- Revisit hover/dwell and path-length efficiency once gaze data are richer.
Implementation Notes: - LBA requires RT data from the verification phase (time from target entry to selection) - Model fitting can be done using RWiener or rtdists packages - Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0) - Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed
16. Planned Advanced Analysis: Linear Ballistic Accumulator (LBA) – No models fit yet at current interim N
Research Question: Can we model the verification phase (time from target entry to selection) using LBA parameters? Do adaptive conditions show different decision thresholds?
Status: ⚠️ PLANNED ANALYSIS - No LBA models fit yet at current interim N (N=26).
Planned Sample Size & Power
The hierarchical LBA analysis will be run on verification-time RTs with parameters (v, b, A, t₀) varying by modality and UI mode. Power and parameter recovery in diffusion/accumulator models depend more on trials per participant than on sheer N, but group-level comparisons still require a sufficient number of participants. Studies on parameter recovery for DDM/LBA and related models generally recommend ≥100 trials per condition and at least 30–40 participants for stable hierarchical estimates. Your design (≈24 trials × 8 conditions ≈ 192 trials per participant) is already strong on the trial side. For group-level parameter differences, however, a target of N ≥ 64 is advisable; N = 48 is workable but will lead to wider credible intervals on parameter contrasts. Ideally, you would validate LBA power for your specific parameterization via simulation (e.g., using the approach described in Kumle et al., 2021, for mixed models).
This section documents the planned LBA analysis. No LBA models have been fit for the current interim dataset; this is included for transparency and to guide future work.
Linear Ballistic Accumulator models decompose reaction time into decision and non-decision components. For gaze-based interaction, we hypothesize that adaptive UI reduces decision threshold (b), indicating less caution needed when targets are easier to acquire.
⚠️ **LBA Analysis Not Yet Implemented**
This section will analyze verification-time RTs using hierarchical Linear Ballistic Accumulator models.
- Basic verification_time_ms analysis: ✅ DONE (Section 10)
- LBA model fitting: ❌ PENDING
- Hierarchical parameter estimation: ❌ PENDING
- LBA requires RT data from the verification phase (time from target entry to selection)
- Model fitting can be done using `RWiener` or `rtdists` packages
- Key parameters to estimate: drift rate (v), threshold (b), starting point (A), non-decision time (t0)
- Hypothesis: Adaptive conditions should show lower threshold (b), indicating less caution needed
- Hierarchical modeling will account for participant-level variation
- Verification time data available: 2656 trials
- Requires sufficient trial counts per condition for stable parameter estimation
- Will be implemented once N reaches target sample size
**Power Considerations:**
- N=48 is sufficient for medium main effects (dz≈0.41, power≈0.80)
- LBA parameters require careful convergence diagnostics
- See `POWER_ANALYSIS_EXPERT_RESPONSE.md` for detailed recommendations
17. Planned Control Theory Analysis: Submovement Models – Trajectory-based metrics not yet implemented at current interim N
Research Question: How does the control loop efficiency differ across conditions? Do adaptive interventions reduce movement corrections?
Status: ⚠️ PLANNED ANALYSIS - No trajectory-based models fit yet at current interim N (N=26).
Planned Sample Size & Power
Trajectory-based kinematic metrics (velocity profiles, jerk, normalized jerk, primary vs corrective phases) are rich but correlated and often noisier than basic RT/TP measures. Because they are derived from the same trial-level data, their within-subject effect sizes are likely small-to-medium, with substantial individual differences. For these analyses, N = 48 is adequate for descriptive modeling and estimation, while N = 64 is a good target if you plan to make stronger inferential claims about UI-mode improvements in movement smoothness or control-loop efficiency. As with LBA, simulation-based power analyses tailored to your specific metrics would be ideal but are beyond the scope of this report (Kumle et al., 2021).
This section documents the planned control theory analysis. Submovement metrics in this report are limited to pre-computed submovement_count (see Section 10). Full trajectory-based control-theory models (jerk, duration-normalized jerk, primary vs corrective phases) will be implemented once trajectory logging is complete across participants. No trajectory-based models have been fit for the current interim dataset; this is included for transparency and to guide future work.
The Optimized Submovement Model [@meyer1988] posits that pointing movements are composed of a primary ballistic impulse followed by n corrective submovements. The Submovement Count (N_sub) serves as a proxy for the efficiency of the control loop. In gaze-based interaction, simulated lag and saccadic blindness force users into an intermittent control regime, theoretically increasing N_sub.
Power Analysis Summary: - N=48 is sufficient for medium main effects (dz≈0.41, power≈0.80) - Interactions will be underpowered unless large (treat as exploratory) - 60fps trajectory data improves measurement precision but doesn’t increase effective N - Key considerations: Use duration-normalized smoothness metrics, control for multiple comparisons (FDR), pre-specify outcomes - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations
⚠️ **Advanced Control Theory Analysis Not Yet Implemented**
This section will analyze movement control using the Optimized Submovement Model.
- Basic submovement_count analysis: ✅ DONE (Section 10)
- Velocity profile analysis: ❌ PENDING
- Submovement detection algorithm: ❌ PENDING
- Primary vs. corrective movement decomposition: ❌ PENDING
✅ Submovement count data available:
- N trials with submovement data: 530
- Mean submovements per trial: 7.47
- Range: 0 - 70
**Submovement Count by Modality and UI Mode:**
|modality |ui_mode | Mean_Submov| SD_Submov|
|:--------|:--------|-----------:|---------:|
|hand |static | 0.00| 0.00|
|hand |adaptive | 0.00| 0.00|
|gaze |static | 17.79| 9.74|
|gaze |adaptive | 15.76| 7.81|
**Data Quality Check:**
✅ Trajectory data available in CSV:
- N trials with trajectory: 3348
- Trajectory stored as JSON string in 'trajectory' column
- Can be parsed in R: jsonlite::fromJSON(trajectory)
- Current analysis uses pre-calculated submovement_count from FittsTask.tsx
**Next Steps for Advanced Analysis:**
1. If cursor trajectory data is needed, add logging to FittsTask.tsx
2. Implement velocity profile extraction from trajectory
3. Detect submovements using zero-crossings in acceleration profile
4. Decompose primary vs. corrective movements
5. Compare control loop efficiency across conditions
6. Test hypothesis: Adaptive UI → fewer submovements (more ballistic)
⚠️ **Advanced Control Theory Metrics Not Yet Implemented**
1. **Velocity Profile Analysis:**
- Peak velocity extraction
- Time to peak velocity (TPV)
- Deceleration phase duration
- Velocity profile asymmetry
2. **Submovement Detection:**
- Zero-crossing detection in acceleration profile
- Primary movement identification (first ballistic phase)
- Corrective submovement count and duration
- Inter-submovement intervals
3. **Control Loop Efficiency:**
- Ratio of primary to total movement time
- Correction frequency (submovements per second)
- Movement smoothness metrics (jerk, normalized jerk - MUST be duration-normalized)
4. **Modality-Specific Patterns:**
- Gaze: Intermittent control due to lag and saccadic blindness
- Hand: Continuous control with proprioceptive feedback
- Adaptive: Reduced corrections due to target expansion/declutter
✅ Trajectory data is now available in 'trajectory' column (JSON string, ~60fps)
✅ Current CSV has submovement_count (pre-calculated) AND raw trajectory
**Power & Analysis Considerations:**
- N=48 is sufficient for main effects (dz≈0.41, power≈0.80)
- Interactions: Underpowered, treat as exploratory
- 60fps improves measurement precision but doesn't increase effective N
- Use duration-normalized smoothness metrics
- Control for multiple comparisons (FDR) if testing many metrics
- Pre-specify theoretically motivated outcomes
**See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed recommendations**
Implementation Notes: - Basic submovement analysis is already in Section 10 (Movement Quality Metrics) - Trajectory data is now available in the trajectory column (JSON string, logged at ~60fps) - Current submovement_count is pre-calculated in FittsTask.tsx using velocity peaks - Power: N=48 sufficient for main effects (dz≈0.41, power≈0.80); interactions underpowered (treat as exploratory) - Key considerations: - Use duration-normalized smoothness metrics (jerk is duration-sensitive) - Control for multiple comparisons (FDR) if testing many kinematic features - Pre-specify a small set of theoretically motivated outcomes - 60fps improves measurement precision but doesn’t increase effective N - See POWER_ANALYSIS_EXPERT_RESPONSE.md for detailed power analysis and recommendations
Potential Issues to Check: - Verify that submovement_count calculation in FittsTask.tsx matches the Optimized Submovement Model definition - Check if velocity profile data is needed or if pre-calculated counts are sufficient - Ensure submovement detection algorithm handles both hand and gaze modalities correctly
18. Summary & Conclusions
Key Findings Summary
Summary of Key Metrics by Condition (Interim N=26)
| hand |
static |
Effective Width (px) |
32.740 |
20.890 |
| hand |
adaptive |
Effective Width (px) |
33.140 |
21.000 |
| gaze |
static |
Effective Width (px) |
36.020 |
20.450 |
| gaze |
adaptive |
Effective Width (px) |
36.090 |
19.620 |
| hand |
static |
Error Rate (%) |
4.870 |
21.520 |
| hand |
adaptive |
Error Rate (%) |
5.010 |
21.820 |
| gaze |
static |
Error Rate (%) |
23.480 |
42.400 |
| gaze |
adaptive |
Error Rate (%) |
21.020 |
40.760 |
| hand |
static |
Movement Time (s) |
1.109 |
0.350 |
| hand |
adaptive |
Movement Time (s) |
1.089 |
0.317 |
| gaze |
static |
Movement Time (s) |
1.230 |
0.511 |
| gaze |
adaptive |
Movement Time (s) |
1.313 |
0.642 |
| hand |
static |
Throughput (bits/s) |
3.510 |
0.930 |
| hand |
adaptive |
Throughput (bits/s) |
3.550 |
0.950 |
| gaze |
static |
Throughput (bits/s) |
3.110 |
1.100 |
| gaze |
adaptive |
Throughput (bits/s) |
3.000 |
1.100 |
Data Quality Notes
- Participants: 26
- Valid Trials: 4734 (out of 5481 total experimental trials)
- Exclusion Rate: 14% (due to errors, timeouts, or invalid RTs)
- Trials per Participant: Mean = 182.1, Range = 99 - 213
Participant Exclusions
Excluded Participants: Seven participants (P002, P003, P007, P008, P015, P039, P040) were excluded from the main 2×2×2 factorial analysis due to a data logging error.
Reason: A bug in the data logging code (fixed December 8, 2025, commit 04758db) incorrectly recorded all trials as pressure = 1 regardless of block condition. The bug was caused by passing the pressure value (always 1.0) instead of the pressure condition boolean (pressureEnabled) to the logging function in TaskPane.tsx line 1105.
Impact: - All 7 affected participants have only pressure = 1 data - Modality and UI Mode were logged correctly (0 mismatches) - Without both pressure conditions (0 and 1), these participants cannot contribute to the full factorial model
Resolution: - Bug fixed and deployed (commit 04758db) - Seven replacement participants (P049-P055) will be collected to maintain planned N=48 - Affected participants’ data retained for exploratory analyses
Current Sample (Interim): N=26 participants with complete data across all experimental conditions.
Planned Final Sample: N=48 participants with complete data across all experimental conditions (not yet achieved in this interim report).
Note: All summary statistics above are based on the current interim sample (N=26). Effect sizes and p-values may change as more data are collected.
For detailed exclusion criteria, see EXCLUSION_CRITERIA.md. For technical audit details, see AUDIT_REPORT.md.